Effectiveness of web search results for genre and sentiment classification

نویسندگان

  • Jin-Cheon Na
  • Tun Thura Thet
چکیده

The motivation of this study is to enhance general topical search with a sentiment-based one where the search results (called snippets) returned by the Web search engine are clustered by sentiment categories. Firstly we developed an automatic method to identify product review documents using the snippets (summary information that includes the URL, title, and summary text), which is considered as genre classification. Then the identified snippets of product review documents were automatically classified into positive (recommended) and negative (non-recommended) documents, which is sentiment classification. Thereafter the user may directly decide to access the positive or negative review documents. In this study we used only the snippets rather than their original full-text documents, and applied a common machine learning technique, SVM (Support Vector Machine), and heuristic approaches to investigate how effectively the snippets can be used for genre and sentiment classification. For genre classification, the hybrid approach which made use of both the machine learning approach using n-gram terms and a heuristic approach using the Title, Summary Text and the URL performed slightly better than the machine learning approach alone. The best result was 84.79% accuracy with unseen 400 snippets. For sentiment classification, when we used only the snippets from PC Magazine review site and their summary texts were only used as a document feature to see whether or not the search engine provides good quality summaries (i.e. summary text) sufficient for sentiment classification, the classification accuracy was rather low. The best result was 70.73% accuracy with unseen 164 snippets. The experiment results and error analyses show that the Web search engine should improve the quality of the snippets especially for opinionated documents (i.e. review documents).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

Performance Improvement of Web Page Genre Classification

The dynamic nature of web and with the increase of the number of web pages, it is very difficult to search required web pages easily and quickly out of thousands of web pages retrieved by a search engine. The solution to this problem is to classify the web pages according to their genre. Automatic genre identification of web pages has become an important area in web page classification, because...

متن کامل

Image classification for Web genre identification

With the countless number of existing websites alongside the virtually unrestricted growth of the World Wide Web, the Web has no boundaries. As a result, there is an increasing need to automatically categorize and classify web sites into genres in order to improve the personalization of search results. This paper will offer conceptual suggestions on how online images can be used to predict the ...

متن کامل

شناسایی خودکار سبک موسیقی

Nowadays, automatic analysis of music signals has gained a considerable importance due to the growing amount of music data found on the Web. Music genre classification is one of the interesting research areas in music information retrieval systems. In this paper several techniques were implemented and evaluated for music genre classification including feature extraction, feature selection and m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Information Science

دوره 35  شماره 

صفحات  -

تاریخ انتشار 2009